⚡ Improve `Engine` Performance and Implementation #578

shaneahmed · 2023-03-31T09:55:58Z

Improve Engines performance and implementation
Redesigns PatchPredictor engine using the new EngineABC base class.
The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
The intermediate output is saved as zarr for the WSIs to resolve memory issues.
The output of model architectures should now be a dictionary.
The output can be specified as AnnotationStore for visualisation using TIAViz.
Fix mypy Type Checks for cli/common.py
Redesigns PatchPredictor engine using the new EngineABC base class.
The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
The intermediate output is saved as zarr for the WSIs to resolve memory issues.
The output of model architectures should now be a dictionary.
The output can be specified as AnnotationStore for visualisation using TIAViz.
Add PatchPredictor Engine based on EngineABC
Add return_probabilities option to Params
Removes merge_predictions option in PatchPredictor engine.
Defines post_process_cache_mode which allows running the algorithm on WSI
Add infer_wsi for WSI inference
Removes save_wsi_output as this is not required after post processing.
Removes merge_predictions and fixes docstring in EngineABCRunParams
compile_model is now moved to EngineABC init
Fixes bug with _calculate_scale_factor
Fixes a bug in class_dict definition.
_get_zarr_array is now a public function get_zarr_array in misc
patch_predictions_as_annotations runs the loop on patch_coords instead of class_probs

- Use `pyproject.toml` for `bdist_wheel` configuration

…-abc

- Improve `Engines` performance and implementation

codecov · 2023-03-31T10:31:07Z

Codecov Report

❌ Patch coverage is 93.62445% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.07%. Comparing base (ce25587) to head (80e7af5).

Files with missing lines	Patch %	Lines
tiatoolbox/models/dataset/dataset_abc.py	73.97%	38 Missing ⚠️
tiatoolbox/models/engine/io_config.py	56.75%	32 Missing ⚠️
tiatoolbox/cli/nucleus_instance_segment.py	66.66%	1 Missing ⚠️
...iatoolbox/models/architecture/timm_efficientnet.py	99.19%	0 Missing and 1 partial ⚠️
tiatoolbox/utils/misc.py	97.77%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    99.27%   95.07%   -4.20%     
===========================================
  Files           71       77       +6     
  Lines         9161     9674     +513     
  Branches      1195     1253      +58     
===========================================
+ Hits          9095     9198     +103     
- Misses          40      440     +400     
- Partials        26       36      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Refactor engines_abc.py

for more information, see https://pre-commit.ci

# Conflicts: # tests/models/test_feature_extractor.py # tiatoolbox/models/models_abc.py

# Conflicts: # tiatoolbox/cli/common.py # tiatoolbox/cli/nucleus_instance_segment.py # tiatoolbox/cli/patch_predictor.py # tiatoolbox/models/engine/semantic_segmentor.py

* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947) This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements. - Delays reader object instantiation to the first `__getitem__` call instead of during initialization - Extracts reader creation logic into a separate `_get_reader` method - Stores image path and mode as instance variables for lazy initialization Speedup for the WSI prediction cell of the patch_prediction example notebook: 2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers. Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable * 🔀 Merge branch develop into dev-engine-abc * 🐛 Fix reader_info read --------- Co-authored-by: Mark Eastwood <[email protected]>

# Conflicts: # tiatoolbox/models/dataset/classification.py

# Conflicts: # tests/models/test_patch_predictor.py

for more information, see https://pre-commit.ci

# Conflicts: # tests/models/test_feature_extractor.py # tests/models/test_multi_task_segmentor.py # tests/models/test_nucleus_instance_segmentor.py # tests/models/test_patch_predictor.py # tests/models/test_semantic_segmentation.py # tiatoolbox/models/architecture/__init__.py

## Summary of Changes ### Major Additions - **Dask Integration:** - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code. - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs. - **Zarr Output Support:** - Added support for saving model predictions and intermediate results directly to Zarr format. - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes. - **SemanticSegmentor Engine:** - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`). - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI. - **Enhanced CLI and Config:** - Added CLI options for memory threshold, unified worker options, and improved mask handling. - Updated YAML configs and sample data for new models and test images. - **Utilities and Validation:** - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`). - Improved annotation store conversion for Dask arrays and Zarr-backed outputs. - **Changes to `kwarg`** - Add `memory-threshold` - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers` - Removed `cache_mode` as cache mode is automatically handled. --- ### Major Removals/Refactors - **Removed Old CLI and Redundant Code:** - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`. - Removed legacy cache mode and patch prediction Zarr store tests. - **Refactored Model and Dataset APIs:** - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs. - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic. - **Test Cleanup:** - Removed or updated tests that relied on old APIs or cache mode. - Refactored test assertions for new output types and Dask array handling. - **API Consistency:** - Standardized function and argument names across engines, CLI, and utility modules. - Updated docstrings and type hints for clarity and consistency. --- ### Notable File Changes - **New:** - `tiatoolbox/cli/semantic_segmentor.py` - `tests/engines/test_semantic_segmentor.py` - **Removed:** - `tiatoolbox/cli/semantic_segment.py` - Old cache mode and patch Zarr store tests - **Heavily Modified:** - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py` - CLI modules and test suites - Dataset and utility modules for Dask/Zarr compatibility --- ### Impact - Enables scalable, parallel, and memory-efficient inference and output saving for large images. - Simplifies downstream analysis by supporting Zarr as a native output format. - Lays the groundwork for further Dask-based optimizations in TIAToolbox. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

## 🚀Summary This PR introduces a new **[GrandQC Tissue Detection Model](https://github.com/cpath-ukk/grandqc/tree/main)** for digital pathology quality control and integrates **EfficientNet-based encoder architecture** into the TIAToolbox framework. --- ## ✨Key Changes - **New Model Architecture** - Added `grandqc.py` implementing a UNet++ decoder with EfficientNet encoder for tissue segmentation. - Includes preprocessing (JPEG compression + ImageNet normalization), postprocessing (argmin-based mask generation), and batch inference utilities. - **EfficientNet Encoder** - Added `timm_efficientnet.py` providing configurable EfficientNet encoders with dilation support and custom input channels. - **Pretrained Model Config** - Updated `pretrained_model.yaml` to register `grandqc_tissue_detection_mpp10` with associated IO configuration. - Corrected `IOSegmentorConfig` references and adjusted resolutions for SCCNN models. - **Testing** - Added comprehensive unit tests for: - `GrandQCModel` functionality, preprocessing/postprocessing, and decoder blocks. - EfficientNet encoder utilities and scaling logic. ## Impact - Enables high-resolution tissue detection for WSI quality control using state-of-the-art architectures. - Improves flexibility for segmentation tasks with EfficientNet encoders. - Enhances code quality and consistency through updated linting and formatting tools. ## Tasks - [x] Re-host GrandQC model weights on TIA Hugging Face - [x] Update `pretrained_model.yaml` - [x] Update `requirements.txt` - [x] Define GrandQC model architecture - [x] Add example usage - [x] Remove segmentation-models-pytorch dependency - [x] Wait for response from GrandQC authors - [x] Add tests - [x] Tidy up --------- Co-authored-by: Shan E Ahmed Raza <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

# 🚀 Summary This PR introduces a new **`DeepFeatureExtractor` engine** to the TIAToolbox framework, enabling extraction of intermediate CNN feature representations from whole slide images (WSIs) or image patches. These features can be used for downstream tasks such as clustering, visualization, or training other models. The update also includes: - A **command-line interface (CLI)** for the new engine. - Extended **CLI utilities** for flexible input/output configurations. - Comprehensive **unit tests** covering patch-based and WSI-based workflows, multi-GPU support, and CLI functionality. - Integration with TIAToolbox’s model registry and CLI ecosystem. --- ## ✨ Key Features ### **New Engine: `DeepFeatureExtractor`** - Extracts intermediate CNN features from WSIs or patches. - Outputs feature embeddings and spatial coordinates in **Zarr** or **dict** format. - Implements **memory-aware caching** for large-scale WSI processing. - Compatible with: - TIAToolbox pretrained models. - Torchvision CNN backbones (e.g., ResNet, DenseNet, MobileNet). - **All timm architectures via `timm.list_models()`**, including HuggingFace-hosted models. - Supports both **patch-mode** and **WSI-mode** workflows. ### **CLI Integration** - Adds `deep-feature-extractor` command to TIAToolbox CLI. - Supports options for: - Input/output paths and file types. - Model selection (`resnet18`, `efficientnet_b0`, timm-based backbones, etc.). - Patch extraction parameters (`patch_input_shape`, `stride_shape`, `input_resolutions`). - Batch size, device selection, memory threshold, overwrite behavior. - Flexible JSON-based CLI options for resolutions and class mappings. ### **Extended CLI Utilities** - New reusable options: - `--input-resolutions`, `--output-resolutions` (JSON list of dicts). - `--patch-input-shape`, `--stride-shape`, `--scale-factor`. - `--class-dict` for mapping class indices to names. - `--overwrite` and `--output-file` for fine-grained control. ### **Unit Tests** - **Engine Tests**: - Patch-based and WSI-based feature extraction. - Validation of Zarr outputs (features and coordinates). - Multi-GPU functionality. - **Model Compatibility**: - Tests with `CNNBackbone` and `TimmBackbone` models. - **CLI Tests**: - Single-file and parameterized runs. - Validation of JSON parsing for CLI options. ### **Codebase Integration** - Registers `DeepFeatureExtractor` in `tiatoolbox.models` and engine registry. - Adds CLI command in `tiatoolbox.cli.__init__.py`. - Updates architecture utilities to support timm-based backbones and HuggingFace models. - Introduces dictionaries for Torch and timm backbones (`torch_cnn_backbone_dict`, `timm_arch_dict`).

shaneahmed added 3 commits March 24, 2023 11:18

🔧 Use pyproject.toml for bdist_wheel configuration

ef55e95

- Use `pyproject.toml` for `bdist_wheel` configuration

Merge remote-tracking branch 'origin/develop' into dev-define-engines…

49a0624

…-abc

⚡ Improve Engines performance and implementation

8ba6def

- Improve `Engines` performance and implementation

shaneahmed self-assigned this Mar 31, 2023

shaneahmed added the enhancement New feature or request label Mar 31, 2023

Merge branch 'develop' into dev-define-engines-abc

5cbcfcf

shaneahmed added this to the Release v2.0.0 milestone Apr 10, 2023

shaneahmed mentioned this pull request Apr 19, 2023

🩹 Support for np.ndarray and WSIReader in PatchPredictor #576

Closed

♻️ Refactor engines_abc.py

fac1000

- Refactor engines_abc.py

shaneahmed changed the title ~~⚡ Improve Engines Performance and Implementation~~ ⚡ Improve Engine Performance and Implementation Apr 28, 2023

shaneahmed added 9 commits May 5, 2023 22:17

Merge branch 'develop' into dev-define-engines-abc

a72d9ba

Merge branch 'develop' into dev-define-engines-abc

57ea44a

Merge branch 'develop' into dev-define-engines-abc

6618161

Merge branch 'develop' into dev-define-engines-abc

6996764

Merge branch 'develop' into dev-define-engines-abc

3584f6c

Merge branch 'develop' into dev-define-engines-abc

eada692

Merge branch 'develop' into dev-define-engines-abc

77f1992

Merge branch 'develop' into dev-define-engines-abc

a477d32

Merge branch 'develop' into dev-define-engines-abc

f3e33b9

shaneahmed linked an issue Jul 14, 2023 that may be closed by this pull request

Shifted patches when merging patch predictions! #634

Open

shaneahmed mentioned this pull request Jul 14, 2023

Shifted patches when merging patch predictions! #634

Open

shaneahmed and others added 8 commits July 21, 2023 17:17

Merge branch 'develop' into dev-define-engines-abc

7d35285

Merge branch 'develop' into dev-define-engines-abc

7bad284

[pre-commit.ci] auto fixes from pre-commit.com hooks

36fd629

for more information, see https://pre-commit.ci

Merge branch 'develop' into dev-define-engines-abc

443141c

Merge branch 'develop' into dev-define-engines-abc

b9d8c38

[pre-commit.ci] auto fixes from pre-commit.com hooks

e608f7b

for more information, see https://pre-commit.ci

Merge branch 'develop' into dev-define-engines-abc

1d7f5c0

[pre-commit.ci] auto fixes from pre-commit.com hooks

b956bf5

for more information, see https://pre-commit.ci

shaneahmed and others added 30 commits June 9, 2025 12:29

Merge branch 'develop' into dev-define-engines-abc

7737c1b

Merge branch 'develop' into dev-define-engines-abc

51fbfa8

Merge branch 'develop' into dev-define-engines-abc

7998c03

# Conflicts: # tests/models/test_feature_extractor.py # tiatoolbox/models/models_abc.py

Merge branch 'develop' into dev-define-engines-abc

6f6cb33

🔀 Merge branch 'develop' into dev-define-engines-abc

1edc2b3

# Conflicts: # tiatoolbox/cli/common.py # tiatoolbox/cli/nucleus_instance_segment.py # tiatoolbox/cli/patch_predictor.py # tiatoolbox/models/engine/semantic_segmentor.py

🐛 Fix FBT001

06a9cb0

🐛 Fix mypy checks

f7abbe8

Merge branch 'develop' into dev-define-engines-abc

616fb84

Merge branch 'develop' into dev-define-engines-abc

d2381c0

# Conflicts: # tiatoolbox/models/dataset/classification.py

Merge branch 'develop' into dev-define-engines-abc

283bd22

Merge branch 'develop' into dev-define-engines-abc

56867fc

Merge branch 'develop' into dev-define-engines-abc

f36abe4

Merge branch 'develop' into dev-define-engines-abc

72d3474

# Conflicts: # tests/models/test_patch_predictor.py

🐛 Fix Use a raw string or re.escape() to make the intention explicit

ca49b18

🔀 Merge develop into dev-engine-abc

e3520ba

[pre-commit.ci] auto fixes from pre-commit.com hooks

efdbf4f

for more information, see https://pre-commit.ci

🐛 Fix ruff error

2f1ca4a

[pre-commit.ci] auto fixes from pre-commit.com hooks

cf36794

for more information, see https://pre-commit.ci

🔥 Remove redundant import

050986f

✅ Update tests to use track_tmp_path for clean up

f5a4c35

Merge branch 'develop' into dev-define-engines-abc

31b7995

Merge branch 'develop' into dev-define-engines-abc

67ef0da

Merge branch 'develop' into dev-define-engines-abc

c535eab

Merge branch 'develop' into dev-define-engines-abc

979390c

Merge branch 'develop' into dev-define-engines-abc

b5ba794

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡ Improve `Engine` Performance and Implementation #578

⚡ Improve `Engine` Performance and Implementation #578

Uh oh!

shaneahmed commented Mar 31, 2023 •

edited

Loading

Uh oh!

codecov bot commented Mar 31, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

⚡ Improve Engine Performance and Implementation #578

Are you sure you want to change the base?

⚡ Improve Engine Performance and Implementation #578

Uh oh!

Conversation

shaneahmed commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

⚡ Improve `Engine` Performance and Implementation #578

⚡ Improve `Engine` Performance and Implementation #578

shaneahmed commented Mar 31, 2023 •

edited

Loading

codecov bot commented Mar 31, 2023 •

edited

Loading